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Abstract — We determine the loss in capacity incurred by 
using signal constellations with a bounded support over general 
complex-valued additive-noise channels for suitably high signal- 
to-noise ratio. Our expression for the capacity loss recovers the 
power loss of 1.53dB for square signal constellations. Q 

I. Introduction 

As it is well known, the channel capacity of the complex- 
valued Gaussian channel with input power at most P and noise 
variance a 1 is given by [ 1 1 



(1) 



C G (P,a)=log 1 



Although inputs distributed according to the Gaussian distri- 
bution attain the capacity, they suffer from several drawbacks 
which prevent them from being used in practical systems. 
Among them, especially relevant are the unbounded support 
and the infinite number of bits needed to represent signal 
points. 

In practice, discrete distributions with a bounded support 
are typically preferred — in this case, the number of points 
is allowed to grow with the signal-to-noise ratio (SNR). 
Ungerboeck computed the rates that are achievable over the 
Gaussian channel when the channel input takes value in a 
finite constellation 0. He observed that, when transmitting 
at a rate of R bits per channel use, there is not much to 
be gained from using constellations with size N larger than 
2 R+1 . Ozarow and Wyner provided an analytic confirmation 
of Ungerboeck's observation by deriving a lower bound on the 
rates achievable with finite constellations 0. In both works, 
the channel inputs are assumed to be uniformly distributed on 
a lattice within some enclosing boundary, where the size of 
the boundary is scaled in order to ensure unit input-power. 

A related line of work considered signal constellations 
with favorable geometric properties, e.g., minimum Euclidean 
distance or minimum average error probability. For signal 
constellations with a large number of points, i.e., dense con- 
stellations, Forney et al. estimated the loss in SNR with 
respect to the Gaussian input to be 101og 10 ^ f» 1.53dB by 
comparing the volume of an n-dimensional hypercube with 
that of an n-dimensional hypersphere of identical average 
power. Later, Ungerboeck's work led to the study of multi- 
dimensional constellations based on lattices 0-0. 

The research leading to these results has received funding from the 
European Community's Seventh Framework Programme (FP7/2007-2013) 
under grant agreement No. 252663 and from the European Research Council 
under ERC grant agreement 259663. 



Recently, Wu and Verdu have studied the information rates 
that are achievable over the Gaussian channel when the input 
takes value in a finite constellation with N signal points 0. 
For every fixed SNR, they show that the difference between 
the capacity and the achievable rate tends to zero exponentially 
in N. For the optimal constellation, the peak-to-average-power 
ratio grows linearly with N, inducing no capacity loss. This 
is in contrast to the constellations considered by Ungerboeck 
and Ozarow and Wyner 0, which have a finite peak-to- 
average-power ratio. 

In this work, we adopt an information-theoretic perspective 
to study the capacity loss incurred by signal constellations with 
a bounded support over the Gaussian channel for sufficiently 
small noise variance. In particular, we use the duality-based 
upper bound to the mutual information in iflOl to provide a 
lower bound on the capacity loss. The results are valid for both 
peak- and average-power constraints and generalize directly 
to other additive-noise channel models. For sufficiently high 
SNR, our results recover the power loss of 1.53dB for square 
signal constellations without invoking geometrical arguments. 

II. Channel Model and Capacity 

We consider a discrete-time, complex-valued additive noise 
channel, where the channel output Yk at time k E Z (where 
Z denotes the set of integers) corresponding to the time-fc 
channel input %% is given by 



Y k =x k +aWk, ke 



(2) 



We assume that {Wk, k e Z} is a sequence of independent 
and identically distributed, centered, unit-variance, complex 
random variables of finite differential entropy. We further 
assume that the distribution of Wk does neither depend on 
cr > nor on the sequence of channel inputs {xk, k £ Z}. 

The channel inputs take value in the set S, which is assumed 
to be a bounded Borel subset of the complex numbers C. We 
further assume that S has positive Lebesgue measure and that 
065. 

The set S can be viewed as the region that limits the signal 
points. For example, for a square signal constellation, it is a 
square: 

S u = {xeC: - A < Re (x) < A, - A < Im (x) < A} (3) 

for some A > 0. Here Re (x) and Im (x) denote the real 
and imaginary part of x, respectively. Similarly, for a circular 
signal constellation, 

S. = {x e C: |z| < R}, for some R > 0. (4) 



We study the capacity of the above channel under an 
average-power constraint P on the inputs. Since the channel 
is memoryless, it follows that the capacity Cs(P, a) (in nats 
per channel use) is given by 

C s (P,<t)= sup I(X;Y) (5) 
xes,E[\x\ 2 ]<p 

where the supremum is over all input distributions with 
essential support in S that satisfy E[|X| 2 ] < P. 

We focus on Cs(P, a) in the limit as the noise variance a 
tends to zero. In particular, we study the capacity loss, which 
we define as 

L^lim(c c (P,a)-C*s(P,a)l. (6) 

(Theorem Q] ahead asserts the existence of the limit.) Here 
Cc(P, cr) denotes the capacity of the above channel when the 
support-constraint S is relaxed, i.e., 

C c (P,a)= sup I(X:Y). (7) 

E[|X|2]<P 



For small cr, we have (H 



C C (P, a) = log — + log(Tre) - h(W) + o(l) (8) 
a 

where the o(l)-term vanishes as a tends to zero. (Here log(-) 
denotes the natural logarithm and h(-) denotes differential 
entropy.) The capacity loss (O can thus be written as 



L = log P + log(Tre) - h(W) 



lim 



sup 



lxes,E[\x\ 2 ]<p 



I(X;Y) - log ± 



(9) 



By choosing an input distribution that does not depend on 
cr, we can achiev^H 



L < log P + log(7re) 
Indeed, we have 



sup h(X). 

XgS,E[\X\ 2 ]<P 



I(X; Y) = h(X + aW) - h(W) + log 



(10) 



(11) 



which follows from the behavior of differential entropy under 
deterministic translation and under scaling by a complex 
number. Extending [ 10, Lemma 6.9] (see also ifTTI ) to complex 
random variables yields then that, for every E [|X| 2 ] < oo and 
E [|VF| 2 ] < oo, the first differential entropy on the right-hand 
side (RHS) of (Qj} satisfies 



timh(X + aW) = h{X). 
0-4.0 



(12) 



Consequently, we obtain 

lmJ sup I{X;Y) -log \\ 
[xe5,E[|x| 2 ]<p J 

> sup lirrJ I(X; Y) - log — 
XeS,E[\X\ 2 ]<P ^-l- [ cr 

sup h(X) - h(W) 
xes,E[|x| 2 ]<p 



(13) 



'We define h(X) = —00 if the distribution of X is not absolutely 
continuous with respect to the Lebesgue measure. 



which together with © yields ( TTOb . 

Let Vu denote the average power of a random variable that 
is uniformly distributed over S, i.e., 



Is kl 2 * 

Is* ' 



A JS 



(14) 



A small modification of the proof in |T2] Th. 12.1.1] shows 
that the density that maximizes h(X) for X 6 S with 
probability one and E [|X| 2 ] < P has the form 



/*(*) 



-AM 2 



J s e x \ x '\ 2 x 



—775 — IjxGiS}, x £ C 



where A = for P > Vy, and where A satisfies 



Is 



-\\x\ 2 



Is 



-\\x'\ 2 T / 



(15) 



(16) 



for P < Vu- Here I {statement} denotes the indicator function: 
it is equal to one if the statement in the brackets is true and 
it is otherwise equal to zero. 
Applying ( TBl l to ( TTOt yields 

I- < log P + log(7re) - log ( J e- AM Vj-AP. (17) 

For P = ?u (and hence A = 0), this becomes 

L < log(Tre) + log (J \x\ 2 ^) ~ 2 log (J^j- (18) 

Specializing ( TT8l to a square signal constellation (0 yields 
(irrespective of A) 

Lb < log — (19) 



6 



which corresponds to a power loss of roughly 1.53dB. Hence, 
we recover the rule of thumb that "square signal constellations 
have a 1.53dB power loss at high signal-to-noise ratio." 

For a circular signal constellation (0), the upper bound (118] ) 
becomes (irrespective of R) 

L. < log I (20) 

recovering the power loss of 1.33dB PI . 

The inequality in (fTTT i holds with equality if the capacity- 
achieving input-distribution does not depend on cr, cf. ( fT3l ). 
However, this is in general not the case. For example, for 
circularly-symmetric Gaussian noise and a circular signal 
constellation (01, it was shown by Shamai and Bar-David |[T3l 
that, for every a > 0, the capacity-achieving input-distribution 
is discrete in magnitude, with the number of mass points 
growing with vanishing a. Nevertheless, the following theorem 
demonstrates that the RHS of ( fT71 ) is indeed the capacity loss. 

Theorem 1 (Main Result): For the above channel model, 
we have 



L = log P + logfTre) — lo 



g (/ e ~ XlX ]2 *J ~ AP (21) 



where A = for P > ?y, and where A satisfies ( fToT l for 
Proof: See Section [TTTJ ■ 



Note 1: It is not difficult to adapt the proof of Theorem Q] 
to other regions S and moment constraints. For example, the 
same proof technique can be used to derive the capacity loss 
when S is a Borel subset of the real numbers and the channel 
input's first-moment is limited, i.e., E[|X|] < A. 

Equations (fTTTt — (TT~3T> demonstrate that the capacity loss (fJTJ 
can be achieved with a continuous-valued channel input hav- 
ing density /*(•)■ Using the lower-semicontinuity of relative 
entropy |[T4l . it can be further shown that (fJT) can also be 
achieved by any sequence of discrete channel inputs {^n} 
for which the number of mass points N grows with vanishing 
a, provided that 



X N 4l t as N 



(22) 



where X* is a continuous random variable having density 
/*(■)• (Here — > denotes convergence in distribution.) Such 
a sequence can, for example, be obtained by approximat- 
ing the distribution function corresponding to /*(•) by two- 
dimensional step functions. 

III. Proof of TheoremQ] 

In view of (O, in order to prove Theorem Q] it suffices to 
show that 



lim< 



sup I(X;Y) -log — 
lxes : E[\x\ 2 ]<p a 



< log I J e 



-Al; 



XP-h(W). (23) 



The claim follows then by combining (l23l with (TTTb - To this 
end, we use the upper bound on the mutual information [10. 
Th. 5.1] 



I(X;Y) < / D(W(-\x) || R(-))Q(x) 



(24) 



where Q(-) denotes the input distribution; W(-|x) denotes the 
conditional distribution of the channel output, conditioned on 
X — x; and R(-) denotes some arbitrary distribution on the 
output alphabet. Every choice of R(-) yields an upper bound 
on I(X;Y), and the inequality in (l24l holds with equality 
if R(-) is the actual distribution of Y induced by Q(-) and 
W(-\>). 

To derive an upper bound on I(X; Y), we apply (124-b with 
R(-) having density 



"(y) 



K 



e,<7 

l l 



i 



where 



Ken- — 



K e , CT n 2 a\y\l+\y\/a 2 ' 
.-AW- r 1 



yeS t 
y i s t 

i 



S c ir 2 a\y\ l + \y\ 2 /a 2 - 



(26) 



is a normalizing constant; where S e denotes the e- 
neighborhood of S 



S £ = {y G C: \y — x'\ < e, for some x' G <S}; 



(27) 



where S £ denotes the complement of S e ; and where A is zero 
for P > Vu and satisfies (TToT l for P < P^. Some useful 
properties of K e CT are summarized in the following lemma. 
Lemma 2: The normalizing constant K e cr satisfies 



inf K e „ > 

e>0, 
cr>0 

lim lim K e „ = 

t\a 0-4.0 ' j s 



e- A ^l y. 



(28a) 
(28b) 



Proof: Omitted. ■ 
We return to the analysis of I(X; Y) and apply (1241) together 
with the density d25l > to express the upper bound as 



D(W(-\x) || R(-))Q(x) 

= -h(Y\X) - ff p{y\x) log r(y)yQ(x) (29) 



where p(y\x) denotes the conditional probability density func- 
tion of Y, conditioned on X = x. 

Evaluation of the conditional differential entropy gives 



h{Y\X) = h{W) - log 



1 



(30) 



and some algebra applied to the second summand in 
allows us to write it as 



p(y\x) logr(y)yQ(x) 

= io g K e , CT + AE[|y| 2 i{y esj] 

+ log(7rV)Pr(y eS c t ) 



M^i{y e< s £ c } 

log(l + l^fV{^G5 e c } 



(31) 



Combining ([30j and (EB with (|29]> and (|24]> yields 

I(X-Y) 

< -h(W) + log + log K e , CT + A E [\Y\ 2 1 {Y G S e }] 

\Y ' 



log(7r 2 CT 2 )Pr(r G SI) 
\Y\ 2 



log 



log 1 



\{Y 



(25) We next show that, for e > 0, 



lim sup E \\Y\ 2 I {y G S c }] 

xes,E[\x\ 2 ]<v 



lim sup 

xeS,E[\x\ 2 ]<p 



lim sup 

CT 4-0 xe5,E[|x| 2 l<p 



log(7rV)Pr(y G «S; 

logO-^i {Yes:} 



lim sup E 

^0 X£S,E[\X\ 2 ]<P 



log 1 



\Y\ 



l{YGS c e } 



HYeSn 
(32) 



< P (33a) 
= (33b) 

= (33c) 
= 0. (33d) 



The first claim J33ab follows by upper-bounding 

I {Y e 6 

E[\Y\ 2 ] 



sup E[|r| 2 i{r es e }] 

xes,E[\x\ 2 ]<p 



< sup 

A'e<S,E[|X| 2 ]<P 

sup E[\X\ 2 ]+* 2 E[\W\ 2 ] 
xes,E[\x\ 2 ]<p 

< P + a 2 



(34) 



where the second step follows because X and W are inde- 
pendent, and the third step follows because E[|X | 2 ] < P and 
E[|W| 2 ]=1. 

To prove d33bb . we first note that 



Pr(Y G S c e ) < Pr(a\W\ > e) 



(35) 



Indeed, if \<rw\ < e, then we have \y — x'\ = [x + crw — x'\ < e 
for x' — x G S, so y G S e - By Chebyshev's inequality [15, 
Sec. 5.4], this can be further upper-bounded by 

2 

Pr(Y eSt) < °—. (36) 



It then follows that, for a < —, 



< -log(7rV 2 )Pr(r G S c e ) < -log(ir 2 <7 2 ) 



(37) 



where the right-most term vanishes as a tends to zero. This 
proves (I33bb . 

We next turn to (I33cl >. We first note that every y G St must 
satisfy \y\ > e, since otherwise \y — x'\ < e for x' — 0, which 
by assumption is in S. Therefore, 



>bg - Pr(r est) 



> 0, for a < e. 
To prove (I33cb . it thus remains to show that 

\y\ 



lim sup E 

Xe5,E[|X| 2 ] 



loe 



< 0. 



(38) 



(39) 



By Jensen's inequality, we have 



l^(M)l<y g 5;} 



<Pr(r est) log! 



E[\Y\i{Yest}} 

crPrfF G S?) 



< ipr(Y g S e c ) logf - P , + fT2 
" 2 v e > 8 1 cr 2 Pr(y G 55 



(40) 



where the last step follows from the Cauchy-Schwarz inequal- 
ity 

E[|y|i{r gS*}] < ,/E[|y| 2 ]p r (r 7s£\. (41) 



Using ( 1361 ) together with the fact that £ h4 — £ log £ is mono- 
tonically increasing for £ < e _1 , we obtain for cr < ee -1 ' 2 

log^I{yG5 e c } 



1 O ( P \ cr" cr- 

< --r bg 1 + - — 7 log -7 



2 e 2 



2e 2 



(42) 



from which ( [391 — and hence (|33cb — follows by noting that 
the RHS of d42l vanishes as cr tends to zero. 

To prove ( I33dl i. we use Jensen's inequality and d34"i l to 
obtain 

E log^l + ^I {Ye St} 

, / E\\Y\ 2 l{Y eS c A}\ 

< Pr(F e S c £ ) log ( i + + Pr(y G st) 



- Pr(F G 5 £ c ) log Pr(F G S c e ) . (43) 

Using ( 1361 ) together with the fact that £ i— > — £log£ is mono- 
tonically increasing for £ < e _1 , we obtain for cr < ee -1 / 2 

logfl + ilfWe^} 



< E 



(7 1 / P cr- 

< — ^g 1 + — + — , 2 ~ & 



^ a 2 , a 2 

- -5- log 



(44) 



from which (I33db follows by noting that the RHS of ( l44b 
vanishes as cr tends to zero. 

Combining (f33ab— ( f33db with I© yields 

linJ sup J(X; F) - log ^ I 
I xes,E[|x| 2 ]<p c J 

< -/i(W)+limlogK e , CT -|-AP 
0-4.0 



A P (45) 



- h(W) +\og\ limK ec 



where the last equation follows from the continuity of x i-> 
log(a;) for x > 0. Letting e tend to zero, and using ( 128bl i in 
Lemma [2] we prove d23l and therefore the desired 



y -AP. (46) 



L = logP + log(7re) -logH e" A|al 

IV. NONASYMPTOTIC CAPACITY LOSS 

A natural approach to prove Theorem Q~| would be to 
generalize ([T2l to 

lim sup h(X + aW) = sup h(X). (47) 

^iO X(E5,E[|X| 2 ]<P XG5,E[|X| 2 ]<P 

While this approach may seem simpler, our approach has 
the advantage that it also allows for a lower bound on the 
nonasymptotic capacity loss 

L(cr)^C7 C (P,Cr)-C7 S (P,Cr), a > 0. (48) 

Indeed, combining d43b . d40b . and ( f34b with d32l yields 

I{X] Y) < -h(W) + log \ + log K e ,7 + A(P + cr 2 ) 

+ iog+(7r 2 cr 2 )Pr(r es c e ) 

+ Ipi-(Y eSt)\o&( l + 4r 



+ Pr(F G S c e ) log ^1 + — + Pr(F e st 
-^Pt(Y eSt)\ogPr(Y est) 



(49) 



asymptotic capacity loss L 




-10 10 20 30 40 50 60 70 



1/cr 2 [dB] 



Fig. I. The capacity loss L(ct) for circularly-symmetric Gaussian noise and 
square constellations with P = P^. 



where log + (£) = max{0, log£}, £ > 0. By upper-bounding 



(50) 



(where tan _1 (-) denotes the arctangent function), and by using 
( l35l l together with the fact that £ H» — £log£ is monotonically 
increasing for £ < e _1 and that — £log£ < 1/e for < £ < 1, 
we obtain, upon minimizing over e > 0, 



<?s(P,ct) 



< inf ^ -ft(W0 + log — + A(P + a 2 ) 



e>0 



log|/ e-^y+l-itan-^iJJ 

log + (7rV)Pr((T|W| > e) 
\pr{a\W\ >e) logh + -^) 

■Pr(a\W\ >e) logfl + ^ + Pr(o-|W| > e) 

1pt(<t\W\ >e) log (Pr( C x| > e)) 

x I {Pr(cr|W r | > e) < 1/e} 
3 



2e 



l{Pr(a|M^| > e) > 1/e} L (51) 



This together with (I48l i yields a lower bound on L(<r). 

Figure Q] shows the lower bound on L(<r) for circularly- 
symmetric Gaussian noise and a square signal constellation 
OJ with P = Py. It further shows the information-rate 
losses of 2 m -ary quadrature amplitude modulation (QAM) for 



m = 10, 16, and 22, which were numerically obtained using 
Gauss-Hermite quadratures [16], as described for example 
in ifTTl Sec. III]. Since for a fixed m the information rate 
corresponding to 2 m -ary QAM is bounded by m bits, the 
rate loss of 2™ l -ary QAM tends to infinity as a tends to 
zero. We observe that the lower bound on L(cr) converges to 
L = log(7re/6) s» 0.353 as a tends to zero, but is rather loose 
for finite a. However, in the proof of Theorem Q] we chose the 
density ( |2~5| > to decay sufficiently slowly, so as to ensure that 
the lower bound on L holds for every unit-variance noise of 
finite differential entropy. For Gaussian noise, a density can be 
chosen that decays much faster, giving rise to a tighter bound. 
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